This is a set of tools to do the following: check how a color palette performs in terms of luminance, check how it performs in various scenarios, and produce a clustered color palette given a number of colors, a colorspace, a gamut and a clustering method. To map a continuous variable to a color palette it is important to use an ordered set of colors that is perceptually uniform, i.e. with linear changes in luminance. ColorRampPalette palettes are usually perceptually uniform, but may be slightly boring. Other color palettes that are widely used (e.g. rainbow palette) can exaggerate differences between close values due to their erratic luminance profile. The viridis package has 4 perceptually uniform palettes that are not just color ramps, and make for nice visualizations in heatmaps, heatscatters and other types of visualization where colors map to continuous variables. On the other hand, such a palette is ill suited for unique labels such as the ones we would use to identify different samples (barplots, boxplots) or clusters (PCA, tSNE, UMAP).
We first load the viridis library and a table with the coordinates for 2 tSNE components:
require(viridis)
tsne <- read.delim("tsnecoords.tab.bin", header = T, stringsAsFactors = F)
tsne <- tsne[,4:5]
luminateWe then define the first function, luminate, which shows the luminance and brightness profiles for the ordered set of colors in the palette, together with the palette itself and color names (or HEX values). The desaturate argument can be used to check for the same characteristics in a grayscale format, using the function desaturate from the colorspace package.
#' Luminance and brightness profile of color palettes
#' Plots colors with their brightness and luminance values
#' @param colors vector of characters with the colors to plot
#' @param cex numeric indicating the text size for color labels
#' @param desaturate logical: should colors be desaturated (converted to grayscale)?
#' @return A plot of a color palette with their brightness and luminance values, the colors themselves and their color names. Color names are plot only if length(colors) <= 20, and dots are shown only if length(colors) <= 50.
luminate <- function(colors, cex = 2, desaturate = F)
{
require(colorspace)
if(desaturate == T)
{
colors = desaturate(colors)
}
l <- length(colors)
lums <- rep(0,l)
for (i in 1:l)
{
lums[i] <- col2rgb(colors[i])[1]*0.299 + col2rgb(colors[i])[2]*0.587 + col2rgb(colors[i])[3]*0.114
}
brightness <- rep(0,l)
for (i in 1:l)
{
brightness[i] <- col2rgb(colors[i])[1]*0.2126 + col2rgb(colors[i])[2]*0.7152 + col2rgb(colors[i])[3]*0.0722
}
ytop <- max(range(c(-60,range(lums), range(brightness))) + 30)
plot(
x = 1:l,
y = rep(-15,l),
ylim = c(-60,ytop),
cex = cex,
pch = 15,
col = colors,
xaxt = "n",
yaxt = "n",
ylab = "luminance and brightness",
xlab = NA,
bty = "n",
main = paste("Perceptual profile, n = ", l, sep = "")
)
lines(
x = 1:l,
y = lums
)
lines(
x = 1:l,
y = brightness,
col = "gray"
)
if(l <= 50)
{
points(
x = 1:l,
y = lums,
pch = 16,
col = colors,
)
points(
x = 1:l,
y = brightness,
pch = 17,
col = colors
)
legend(
"top",
bty="n",
lwd=1,
col=c("black", "gray"),
pch=c(16, 17),
legend=c("Luminance", "Brightness")
)
}
else if(l > 50)
{
legend(
"top",
bty="n",
lwd=1,
col=c("black", "gray"),
legend=c("Luminance", "Brightness")
)
}
ticks <- seq(0, max(lums), by = 20)
axis(2, at=ticks, labels=ticks)
if (l <= 20)
{
text(labels = colors, srt = 90, x = 1:l, y = -50, cex = 0.7)
}
}
Let’s look at the luminate output of the viridis “C” palette:
luminate(viridis(20, option ="C"))
And now with desaturation (notice how luminance and brightness are overlapping):
luminate(viridis(20, option ="C"), desaturate = T)
Let’s now look at the luminate output of the rainbow palette:
luminate(rainbow(n = 20))
And with desaturation:
luminate(rainbow(n = 20), desaturate = T)
multiplottestEven though perceptually uniform palettes are very important for continuous variables, they are not ideal for distinguishing samples in barplots, boxplots, line charts or in 2D representations in which clusters are important, such as tSNE plots from single cell data.
We write another function, multiplottest, that shows how a palette performs in various use cases using simulated data and real single cell data. This function simulates values for a barplot, a heatmap, some line graphs and uses a tSNE plot from a real single cell dataset. In the tSNE plot, clusters are artificially created by hierarchical clustering on the tSNE coordinates, so that we always have 1 cluster per color.
#' Multi plot test for color palettes
#' A panel with 4 plots (barplot, line chart, heatmap and tSNE) coloured with a user-supplied palette
#' @param colors vector of characters with colors
#' @param randomize logical: should the order of colors in the palette be randomized? Only applies to barplot and line chart.
#' @return 4 plots with the supplied colors.
multiplottest <- function(colors, randomize = F)
{
layout(matrix(1:4, nrow = 2))
if(randomize == T)
{
barplot(sample(1:length(colors), length(colors)), col = colors[sample(1:length(colors), length(colors), replace = F)], border = NA)
}
else
{
barplot(sample(1:length(colors), length(colors)), col = colors, border = NA)
}
hmap = matrix(rep(0,length(colors)*length(colors)), ncol = length(colors))
for(i in 1:length(colors)) hmap[,i] = runif(length(colors), i, i*3)
hr <- hclust(as.dist(hmap), method="complete")
hc <- hclust(as.dist(t(hmap)), method="complete")
hm2 = as.matrix(hmap[hr$order, hc$order])
image(t(as.matrix(hm2)), col = colors, xaxt = "n", yaxt = "n")
plot(x = 1:length(colors), y = rep(10, length(colors)), col = colors[1], cex = 0, ylim = c(0, length(colors)+3), xlab = NA, ylab = NA)
if(randomize == T)
{
randomcols = colors[sample(1:length(colors), length(colors))]
for(i in 1:length(colors)) lines(x = 1:length(colors), y = runif(length(colors), i, i+2.5), col = randomcols[i], lwd = 2)
} else {
for(i in 1:length(colors)) lines(x = 1:length(colors), y = runif(length(colors), i, i+2.5), col = colors[i], lwd = 2)
}
tmp = dist(tsne)
hc = hclust(tmp)
cutt = cutree(hc, k = length(colors))
dat = as.data.frame(cbind(tsne,cutt))
plot(dat[,1], dat[,2], col = colors[dat[,3]], pch = 16, xlab = NA, ylab = NA)
}
We can see how viridis palettes perform well for heatmaps, but not for any other categorical variable
multiplottest(viridis(25, option = "D"))
Whereas a unique palette (taken from here) is very good for categorical variables, but horrible for heatmaps:
uniquecols =c("#e6194B", "#3cb44b", "#ffe119", "#4363d8", "#f58231", "#911eb4", "#42d4f4", "#f032e6", "#bfef45", "#fabebe", "#469990", "#e6beff", "#9A6324", "#fffac8", "#800000", "#aaffc3", "#808000", "#ffd8b1", "#000075", "#a9a9a9")
multiplottest(uniquecols)
Notice that we removed black and white colors from the original palette. # Multiplot inspector -
multiplottest
There are some interesting ways to generate palettes with diverse colors automatically. There’s more than a century of theoretical and experimental research on how colors are perceived and what color spaces can be constructed based on changes in color components. As an interesting experiment that resulted in a useful tool, Mathieu Jacomy at the Sciences Po Média Lab developed unique palettes using clustering on color spaces. As an exercise and in order to port some of his results to R, I will try to reproduce his workflow just by looking at how he developed it. Let’s first build the simplest color space, RGB. The RGB color space is a cube drawn by stepwise increments in R, G and B values. RGB values are interpreted by R in the 0 - 1 interval, whereas they are normally defined between 0 and 255, so they will need to be converted. We can populate a 3D array with a simple loop, which is feasible for small numbers of steps (in our case, 64).
rgbmat = array(0, dim = c(64,64,64))
for(i in 1:64)
{
for(j in 1:64)
{
for(q in 1:64)
{
rgbmat[i,j,q] = rgb(seq(0,255,length.out = 64)[i]/255, seq(0,255,length.out = 64)[j]/255, seq(0,255,length.out = 64)[q]/255)
}
}
}
To simplify plotting the 3D matrix, we melt (from reshape2) into a long format with 3 variables, which will be our X, Y and Z:
mm = reshape2::melt(rgbmat)
mm$value = as.character(mm$value)
We can now visualize the cube using scatter3d from the plot3D package:
layout(matrix(1:2, nrow = 1))
plot3D::scatter3D(mm[,1],mm[,2],mm[,3], colvar = NULL, col = as.character(mm$value))
plot3D::scatter3D(mm[,1],mm[,2],mm[,3], colvar = NULL, col = as.character(mm$value), theta = 180)
Notice how on one diagonal of the cube (from black vertex to white vertex, i.e. from RGB(0,0,0) to RGB(1,1,1) lies the grayscale:
diagonal = 1:64
diagcol = vector()
for(i in 1:64) diagcol[i] = mm[which(mm[,1] == diagonal[i] & mm[,2] == diagonal[i] & mm[,3] == diagonal[i]),4]
plot3D::scatter3D(diagonal, diagonal, diagonal, colvar= NULL, col = diagcol, pch = 16)
We can now apply k-means clustering (using the function kmeans from the stats package) to cluster the 3D space in k subspaces, of which we will take the colors lying at the coordinates defined by cluster centroids. Importantly, k-means center coordinates must be rounded so that we can find them again in our original 3D array. This will generate some approximations. Let’s start with 8 different colors:
km = stats::kmeans(mm[,1:3], centers = 8)
rkm = round(km$centers[,1:3])
cols = vector()
mm$value = as.character(mm$value)
for(i in 1:8) cols[i] = mm[which(mm[,1] == rkm[i,1] & mm[,2] == rkm[i,2] & mm[,3] == rkm[i,3]),4]
The colors will be inside the cube, but where are they exactly?
plot3D::scatter3D(rkm[,1], rkm[,2], rkm[,3], colvar= NULL, col = as.character(cols), pch = 16, xlim = c(0,64), ylim = c(0,64), zlim = c(0,64), cex =3)
Let’s see how the clustering performed:
multiplottest(cols)
These colors are quite distinguishable, but elicit a mixed feeling of despair and nostalgia.
We can try another clustering method, CLARA, from the cluster package. CLARA uses partitioning around medoids (PAM) after a random sampling of the space. Notice that since this method uses medoids, which are actual points in the space, we don’t need to round them to find the points in the array.
claram = cluster::clara(mm[,1:3], k = 8)
claram = claram$medoids
cols = vector()
for(i in 1:8) cols[i] = mm[which(mm[,3] == claram[i,1] & mm[,1] == claram[i,2] & mm[,2] == claram[i,3]),4]
Let’s see our new colors in the cube:
plot3D::scatter3D(claram[,1], claram[,2], claram[,3], colvar= NULL, col = as.character(cols), pch = 16, xlim = c(0,64), ylim = c(0,64), zlim = c(0,64), cex = 3)
Colors are still inside the cube, but not as evenly distributed as in k-means. Let’s see how this palette performs:
multiplottest(cols)
Much more astethically pleasing than k-means.
However, the RGB space is not perceptually uniform, and may generate colors that are not ideal for their differences (or similarities). For this reason, it may be much better in terms of perceptuality and pleasantness of the colors to use the CIE XYZ 1931 color space. It is still not completely perceptually uniform (whereas CIE Lab 1976 is more perceptually uniform), but can be used in practice for color definition. An important caveat about the representation on monitors of colorspaces is that, even if the colorspace spans a certain array of values, devices can only faithfully display a subspace of this array. This subspace is called the gamut of the colorspace. The CIE 1931 colorspace is a 3-dimensional space, but for visualization purposes we flatten it by keeping a constant luminance value, and plotting the other components. A very handy package, SpecHelpers, contains data and functions to draw the CIE 1931 chromaticity diagram for a given exposure (luminance) value. We tart by plotting the CIE 1931 chromaticity diagram using the plotCIEchrom function included in the package, using the sRGB colorspace and the exposure of 1.3.
library(SpecHelpers)
library(grid)
library(raster)
SpecHelpers::plotCIEchrom(gradient = "sl", colorspace = "sRGB", ex = 1.3)
The contour of the chromaticity diagram is drawn by the xy coordinates given by visible light, therefore is called the spectral locus. Colors lying on the spectral locus are “pure” colors, and they are very hard to render by devices. The rest of the colors within the diagram is obtained by making a gradient of colors from the spectral locus. The D65 point is the white reference, which influences the whole colorspace and changes with changes in luminance. This is the colorspace that we want to cluster in order to find centroids or medoids and obtain a unique palette, using an approach similar to the one developed for the RGB cube.
By looking into the code for plotCIEchrom, we find out how the object is constructed. The first operation is the definition of the spectral locus coordinates by looking up a table that contains CIE XYZ coordinates for any given wavelength between 390 and 830, in 1 nm increments. The function subsets this table by taking all the wavelengths below 650 nm.
head(CIExyz)
keep <- which(CIExyz$wavelength <= 650)
Lxyz <- CIExyz[keep,]
The function then calls another function, prepCIEgradient, in which the diagram defined by Lxyz is filled with a gradient from the sRGB color space at a given exposure value. The way this is done is simple:
expand.grid, in a way very similar to what melt does.
inout function is called to determine which points lie inside the chromaticity diagram, and which ones lie outside.
convertColor function from grDevices is used to convert all XYZ coordinates into the sRGB color space.
ciegradient = SpecHelpers::prepCIEgradient(vertices = Lxyz, colSpace = "sRGB", ex = 1.3)
The ciegradient object is a raster, which - as happens for image or other plotting functions - is plotted by rotating the matrix by 90 degrees, anticlockwise. This means that to replot this object in base graphics the raster matrix must be transposed and inverted on the column component:
ciedft = t(as.matrix(as.raster(ciegradient)))
ciedf = ciedft[,ncol(ciedft):1]
We will now select the gamut on which the clustering will be performed. As mentioned earlier, the gamut is a subset of the colorspace that can be rendered by a particular device. The SpecHelpers package provides XYZ coordinates for several gamuts, such as “SWOP” (print CMYK), “sRGB”, “NTSC”, “Adobe”, “Apple” and “CIE”. The color gradient can be obtained in the same way as before, only specifying the coordinates of the gamut rather than the whole chromaticity diagram. We will use the “NTSC” gamut:
gamutv <- SpecHelpers::getGamutValues("NTSC")
gamutdft = t(as.matrix(as.raster(SpecHelpers::prepCIEgradient(vertices = gamutv, colSpace = "sRGB", ex = 1.3))))
gamutdf = gamutdft[,501:1]
We now melt both matrices, as we did for the cube RGB space, and exclude white (#FFFFFF) coordinates from the clustering:
ciemelt = reshape2::melt(ciedf)
ciemelt$value = as.character(ciemelt$value)
ciespace = ciemelt[which(ciemelt$value != "#FFFFFF"),]
gamutmelt = reshape2::melt(gamutdf)
gamutmelt$value = as.character(gamutmelt$value)
gamutspace = gamutmelt[which(gamutmelt$value != "#FFFFFF"),]
Now we have a set of coordinates that can be clustered and plotted easily. First attempt with k-means, 8 colors:
gamutkm = stats::kmeans(gamutspace[,1:2], centers = 8)
gamutkm = round(gamutkm$centers)
colorvec = 1:8
for(i in 1:8) colorvec[i] = gamutspace[which(gamutspace[,1] == gamutkm[i,1] & gamutspace[,2] == gamutkm[i,2]),3]
Let’s have a look at the resulting palette with multiplottest:
multiplottest(colorvec)
We can also visualize the colors on the colorspace, with some fancy annotations. Since XYZ coordinates are defined between 0 and 0.9, we have to rescale all coordinate values. The gamut is painted as a polygon using the convex hull of the points in the gamut. Color annotations are ordered by the y coordinate to avoid tangled lines.
plot((ciemelt[,1]/501)*0.9, (ciemelt[,2]/501)*0.9, col = ciemelt$value, pch = 16, cex = 0.2, xlab = "x", ylab = "y", xlim = c(0,1), main = "8 clustered colors on CIE 1931 colorspace")
orderg = order(gamutkm[,2], decreasing = F)
gamutkm = gamutkm[orderg,]
colorvec = colorvec[orderg]
polygon((gamutspace[chull(gamutspace[,1], gamutspace[,2]),1]/501)*0.9, (gamutspace[chull(gamutspace[,1], gamutspace[,2]),2]/501)*0.9)
text(y = seq(0.1,0.8, length.out = 8), x = rep(0.9, 8), cex = 0.8, labels = colorvec)
for(i in 1:8) segments(x0 = (gamutkm[i,1]/501)*0.9, y0 =(gamutkm[i,2]/501)*0.9, x1 = 0.8, y1 = seq(0.1,0.8, length.out = 8)[i], lwd = 0.5)
points((gamutkm[,1]/501)*0.9, (gamutkm[,2]/501)*0.9, pch = 21, bg = colorvec, col = "black")
text(y = 0.9, x = 0.1, pos = 4, cex = 0.7, labels = "Gamut: NTSC, Exposure: 1.3, clustering method: kmeans")
points(y = seq(0.1,0.8, length.out = 8), x = rep(0.8, 8), cex = 4, pch = 21, col = "black", bg = colorvec)
XYZ_clusterpaletteNow let’s package this in a handy function where we can decide the number of colors, the gamut, the clustering method and the exposure:
#' XYZ Cluster Palette
#' Generates a clustered palette
#' @param k numeric, number of colors to be generated
#' @param gamut character, defines the gamut on the CIE XYZ 1931 space. Must be one of "SWOP", "sRGB", "NTSC", "Apple", "CIE"
#' @param exposure numeric, the luminance of the colorspace
#' @param clusterMethod character, either of "CLARA" or "kmeans"
#' @param plot logical, should the plot on the colorspace be produced?
#' @return a vector of hex values for clustered colors, and optionally a plot
XYZ_clusterpalette = function(k, gamut = "NTSC", exposure = 1.3, clusterMethod = "CLARA", plot = T)
{
keep <- which(CIExyz$wavelength <= 650)
Lxyz <- CIExyz[keep,]
gamutv <- SpecHelpers::getGamutValues(gamut)
ciedft = t(as.matrix(as.raster(SpecHelpers::prepCIEgradient(vertices = Lxyz, colSpace = "sRGB", ex = exposure))))
ciedf = ciedft[,501:1]
gamutdft = t(as.matrix(as.raster(SpecHelpers::prepCIEgradient(vertices = gamutv, colSpace = "sRGB", ex = exposure))))
gamutdf = gamutdft[,501:1]
ciemelt = reshape2::melt(ciedf)
ciemelt$value = as.character(ciemelt$value)
ciespace = ciemelt[which(ciemelt$value != "#FFFFFF"),]
rownames(gamutdf) = 1:nrow(gamutdf)
colnames(gamutdf) = 1:ncol(gamutdf)
gamutmelt = reshape2::melt(gamutdf)
gamutmelt$value = as.character(gamutmelt$value)
gamutspace = gamutmelt[which(gamutmelt$value != "#FFFFFF"),]
if(clusterMethod == "kmeans")
{
gamutkm = stats::kmeans(gamutspace[,1:2], centers = k)
gamutkm = round(gamutkm$centers)
colorvec = vector()
for(i in 1:k) colorvec[i] = gamutspace[which(gamutspace[,1] == gamutkm[i,1] & gamutspace[,2] == gamutkm[i,2]),3]
}
else if(clusterMethod == "CLARA")
{
gamutkm = cluster::clara(gamutspace[,1:2], k)
gamutkm = gamutkm$medoids
colorvec = gamutspace[rownames(gamutkm),3]
}
if(plot == T)
{
plot((ciemelt[,1]/501)*0.9, (ciemelt[,2]/501)*0.9, col = ciemelt$value, pch = 16, cex = 0.2, xlab = "x", ylab = "y", xlim = c(0,1), main = paste(k, "clustered colors on CIE 1931 colorspace"))
orderg = order(gamutkm[,2], decreasing = F)
gamutkm = gamutkm[orderg,]
colorvec = colorvec[orderg]
polygon((gamutspace[grDevices::chull(gamutspace[,1], gamutspace[,2]),1]/501)*0.9, (gamutspace[grDevices::chull(gamutspace[,1], gamutspace[,2]),2]/501)*0.9)
text(y = seq(0.1,0.8, length.out = k), x = rep(0.9, k), cex = 0.8, labels = colorvec)
for(i in 1:k) segments(x0 = (gamutkm[i,1]/501)*0.9, y0 =(gamutkm[i,2]/501)*0.9, x1 = 0.8, y1 = seq(0.1,0.8, length.out = 8)[i], lwd = 0.5)
points((gamutkm[,1]/501)*0.9, (gamutkm[,2]/501)*0.9, pch = 21, bg = colorvec, col = "black")
text(y = 0.9, x = 0.1, pos = 4, cex = 0.7, labels = paste("Gamut: ", gamut, ", Exposure: ", exposure, ", clustering method: ", clusterMethod, sep = ""))
points(y = seq(0.1,0.8, length.out = k), x = rep(0.8, k), cex = 4, pch = 21, col = "black", bg = colorvec)
}
return(colorvec)
}
Now let’s do a few test runs changing parameters:
layout(matrix(1:4, nrow = 2))
XYZ_clusterpalette(8, gamut = "NTSC", clusterMethod = "CLARA")
## [1] "#9DA7F7" "#FF9FB4" "#FF7158" "#DBD4B3" "#00FFCC" "#FFC044" "#E5F76E"
## [8] "#60FF49"
XYZ_clusterpalette(8, gamut = "sRGB", clusterMethod = "CLARA")
## [1] "#8976FF" "#9BA7F8" "#FF87D1" "#FF5C80" "#E7D9A4" "#FFC482" "#65FFA2"
## [8] "#DCFD66"
XYZ_clusterpalette(8, gamut = "SWOP", clusterMethod = "kmeans")
## [1] "#A98DFF" "#FF7AD2" "#81C8E7" "#FF8585" "#F1C8B1" "#48F5C6" "#FFE369"
## [8] "#6CFF92"
XYZ_clusterpalette(8, gamut = "SWOP", clusterMethod = "kmeans", exposure = 1.1)
## [1] "#7883DA" "#CC76BF" "#FF6C9A" "#45BEB9" "#FF7464" "#BEB198" "#DFC15B"
## [8] "#48E384"